Fusion of Multiple Features and Ranking SVM for Web-based English-Chinese OOV Term Translation
نویسندگان
چکیده
This paper focuses on the Web-based English-Chinese OOV term translation pattern, and emphasizes particularly on the translation selection strategy based on the fusion of multiple features and the ranking mechanism based on Ranking Support Vector Machine (Ranking SVM). By utilizing the CoNLL2003 corpus for the English Named Entity Recognition (NER) task and selected new terms, the experiments based on different data sources show the consistent results. Our OOV term translation model can “filter” the most possible translation candidates with better ability. From the experimental results for combining our OOV term translation model with English-Chinese CrossLanguage Information Retrieval (CLIR) on the data sets of Text Retrieval Evaluation Conference (TREC), it can be found that the obvious performance improvement for both query translation and retrieval can also be obtained.
منابع مشابه
Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery
Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese–English CLIR. In Chinese–English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence ...
متن کاملEnglish-Chinese Bi-Directional OOV Translation based on Web Mining and Supervised Learning
In Cross-Language Information Retrieval (CLIR), Out-of-Vocabulary (OOV) detection and translation pair relevance evaluation still remain as key problems. In this paper, an English-Chinese Bi-Directional OOV translation model is presented, which utilizes Web mining as the corpus source to collect translation pairs and combines supervised learning to evaluate their association degree. The experim...
متن کاملWeb based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection
Cross-Language Information Retrieval (CLIR) system uses dictionaries for information retrieval. However, out of vocabulary (OOV) terms cannot be found in dictionaries. Although many researchers in the past have endeavored to solve the OOV term translation problem, but little attention has been paid to hybrid translations “α1antitrypsin deficiency (α1-抗胰蛋白酶缺乏症)”. This paper presents a novel OOV ...
متن کاملRMIT Chinese-English CLIR at NTCIR-4
We participated in the Chinese-English CLIR task, concentrating primarily on the issues of translation disambiguation and automatic translation extraction of OOV terms. A new technique to identify and translate Chinese OOV terms using the web was developed. The results for this aspect of our work appears promising.
متن کاملFusion of Multiple Features and Supervised Learning for Chinese OOV Term Detection and POS Guessing
In this paper, to support more precise Chinese Out-of-Vocabulary (OOV) term detection and Part-of-Speech (POS) guessing, a unified mechanism is proposed and formulated based on the fusion of multiple features and supervised learning. Besides all the traditional features, the new features for statistical information and global contexts are introduced, as well as some constraints and heuristic ru...
متن کامل